Goto

Collaborating Authors

 sense amplifier


FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

arXiv.org Artificial Intelligence

Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency and 1.22X area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% sparsity


PIM-DRAM:Accelerating Machine Learning Workloads using Processing in Memory based on DRAM Technology

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) have gained significant interest in the recent past for plethora of applications such as image and video analytics, language translation, and medical diagnosis. High memory bandwidth is required to keep up with the needs of data-intensive DNN applications when implemented on a von-Neumann hardware architecture as majority of the data resides in the main memory. Therefore, processing in memory can provide a promising solution for the memory wall bottleneck for ML workloads. In this work, we propose a DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation to accelerate matrix vector operations in ML workloads. Moreover, we propose a processing-in-memory DRAM bank architecture, data mapping and dataflow based on the proposed primitive. System evaluations performed on networks like AlexNet, VGG16 and ResNet18 show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU and an ideal conventional (non-PIM) baseline architecture with infinite compute bandwidth, respectively.


An Analog VLSI Chip for Radial Basis Functions

Neural Information Processing Systems

We have designed, fabricated, and tested an analog VLSI chip which computes radial basis functions in parallel. We have developed a synapse circuit that approximates a quadratic function. We aggregate these circuits to form radial basis functions. These radial basis functions are then averaged together using a follower aggregator.


An Analog VLSI Chip for Radial Basis Functions

Neural Information Processing Systems

We have designed, fabricated, and tested an analog VLSI chip which computes radial basis functions in parallel. We have developed a synapse circuit that approximates a quadratic function. We aggregate these circuits to form radial basis functions. These radial basis functions are then averaged together using a follower aggregator.


An Analog VLSI Chip for Radial Basis Functions

Neural Information Processing Systems

We have designed, fabricated, and tested an analog VLSI chip which computes radial basis functions in parallel. We have developed asynapse circuit that approximates a quadratic function. We aggregate these circuits to form radial basis functions. These radial basis functions are then averaged together using a follower aggregator. 1 INTRODUCTION Radial basis functions (RBFs) are a mel hod for approximating a function from scattered training points [Powell, H)87]. RBFs have been used to solve recognition and prediction problems with a fair amonnt of success [Lee, 1991] [Moody, 1989] [Platt, 1991]. The first layer of an RBF network computes t.he distance of the input to the network to a set of stored memories. Each basis function is a nonlinear function of a corresponding distance. Tht basis functions are then added together with second-layer weights to produce the output of the network.